New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate ndarray-parallel and make rayon an optional feature #563
Conversation
cd4808f
to
3464fe0
Compare
Here I'm wondering, should we keep the parallel prelude separate? Or pack into the regular ndarray prelude? It only contains this: pub mod prelude {
pub use rayon::prelude::{ParallelIterator, IndexedParallelIterator,
IntoParallelIterator, IntoParallelRefIterator, IntoParallelRefMutIterator};
}
|
cc @cuviper Any thoughts about our integration with rayon and our parallel iterators? We're just moving code here and pulling in the rayon 1.0 dep in the main crate, how it's all supposed to be. |
I lean towards keeping them separate, but putting them in Edit: Earlier, I suggested adding Another interesting idea is to reexport all of // ndarray/src/lib.rs
pub use rayon; although users could just depend directly on |
reexporting rayon seems worthwhile. |
One other idea for the prelude is to reexport the traits in // ndarray/src/prelude.rs
#[doc(hidden)]
pub use rayon::prelude::{
ParallelIterator as __rayon_ParallelIterator,
IndexedParallelIterator as __rayon_IndexedParallelIterator,
IntoParallelIterator as __rayon_IntoParallelIterator,
IntoParallelRefIterator as __rayon_IntoParallelRefIterator,
IntoParallelRefMutIterator as __rayon_IntoParallelRefMutIterator,
}; I think |
We're merging breaking changes to prepare next major release right? I think taken together with being an opt-in feature it seems the rayon traits in the regular prelude is a breaking change / method name clash hazard. I think it's pretty obvious we should have rayon as optional, but maybe I'm wrong? |
Yes, that makes sense to me.
I think it's a good idea to make rayon optional for the time being, given the difference in compile times; we can always make it required later. |
I think it's good to keep rayon optional even in the future. In my own opinion, it's best not to force certain parallel dependencies on people. Groups can have some very opinionated ideas for how best to parallelize their code. I've especially seen this in the C++ and C realm when you're moving towards the Exascale type of things. You can see very domain specific solutions that aren't obvious that they lead to a noticeable performance difference. |
@rcarson3 fair enough, we don't have a plan for transparent/automatic parallelization anywhere though, it's all just separate methods and parallel iterators one can use so far. |
@bluss I'm aware of this I've had the pleasure of using a few of the parallel iterators to get some easy parallelization out of my code :) I guess I've become a bit more cautious(?) towards requiring one parallelization approach over another for even the parallelization of iterators/compute kernels as I've become more and more exposed to the different ways people have found to tackle these type of problems for HPC applications. I'll probably still continue to use this crate no matter what decisions y'all end up going way down the road for the parallel stuff, since I enjoy the ease this crate has made writing numerical code in Rust :) |
@rcarson3 Out of curiosity, have you done any HPC-type stuff in Rust? If so, what crates were useful? I have a small SLURM cluster but haven't yet tried running any Rust code across multiple machines. |
f9087ed
to
8fde766
Compare
@jturner314 I haven't gotten the chance to mess around with any HPC stuff in Rust yet. Hopefully, I'll get the chance to experiment with some of this stuff in the next month or so. Also, I've been eyeballing rsmpi as a traditional method to accomplish HPC type stuff in Rust. It looks like some people from LANL are involved which means we might start seeing Rust used at the DOE national labs which is exciting. I've also been looking at timely-dataflow and differential-dataflow to try and figure out if they could be used in a similar manner to what you might use MPI for. |
@rcarson3 Thanks for the links. The dataflow projects look particularly exciting, and relatively easy-to-use. |
Put the parallel module down in the crate file, to fix inherent method order. The order in docs is important and is decided by the order of the modules in the main crate file.
Here we try to use the new rustdoc links. They should render fine on nightly and thus on docs.rs, and on stable the '[' brackets are visible verbatim, but we hope most are using docs.rs
This way they are somewhat easier to decipher in rustdoc.
Easier to read in rustdoc.
e14cac1
to
82f5cb7
Compare
This should be ready. Using separate parallel prelude, and it contains the rayon traits and the par_azip macro. |
I think the bounds on the |
Looks good to me! I noticed that views and zips are unindexed -- AFAIK their length is known, so is this just because they're hard to split at arbitrary indexes? (I have the same limitation in rayon-hash.)
Is there a technical reason why you're requiring |
@cuviper They are impossible to split at arbitrary linear indexes, if they have more than one dimension. i.e a 3 × 3 × 3 array view should be split along a good combination of axis and index. For example along axis 0, index 1 (That split produces the following pieces: view of 1 × 3 × 3 and 2 × 3 × 3 see also illustrations). And we want to split them in a way that preserves element locality. This is a dynamic property. Maybe the elements are closest to each other along axis 1 and furthest from each other along axis 0, then we split along axis 0. We want to increase our chances to end up with a contiguous chunk to work on for the final single job. We can split successively on different axes, for example when the first axis has been split down to length 1, we go on to the current axis of furthest separation with len > 1 and split along that. What can be added is to make more producers indexed, in their 1D cases only, but I'm unsure if that's possible. Say we have an I think the Sync thing is just an oversight / “old quick solution to be safe”, I don't remember any reason we require that. |
I see -- and I suppose direct edit: I think I mistook the macros for the indexed
The I think it would work fine to just implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging this stuff into ndarray
makes things a lot nicer. Removing the Ndarray*
traits is especially satisfying. :)
I've added some comments, primarily about docs. Some additional docs changes:
- Remove
See also the [`ndarray-parallel`] crate for integration with rayon.
fromsrc/lib.rs
. - Add the
rayon
feature flag to the list insrc/lib.rs
.
I'd also remove the parallel
directory (or maybe delete everything other than parallel/README.md
and update it to say that the ndarray-parallel
crate is deprecated). This makes the Git diff a lot easier to read (since it causes Git to recognize that most files were renamed with only small changes).
src/parallel/ext_traits.rs
Outdated
/// `Zip`. | ||
/// | ||
/// Requires crate feature `rayon`. | ||
pub fn par_apply<F>(self, function: F) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving par_apply
out of this module and into src/zip/mod.rs
immediately after apply
. (This would make the docs somewhat clearer.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is left for later
We don't need another dep just to have faster rayon benchmarks (they are for comparison anyway.)
@cuviper We have parallel iterators for array views, for &Array and &mut Array, but not Implementing Indexed parallel iterators is left for later. This PR is mostly moving the code from a separate crate into the main crate. |
OK, but note that having |
An ArrayView is to an Array like a slice to a Vec, so I think it makes sense to implement the indexed parallel iterator on the arrayview, we don't have anything separate from array views that handles an &Array. |
Right, but if I understand your earlier comment about preserving locality, a general All I'm suggesting is: // (with lots of omissions...)
impl IntoParallelIterator for &Array {
type Iter = Parallel<&Array>;
}
impl ParallelIterator for Parallel<&Array> {
fn drive_unindexed<C>(self, consumer: C) -> C::Result {
// A delayed view conversion lets you do something else later
self.0.view().into_par_iter().drive_unindexed(consumer)
}
} |
Aha that makes sense. We'll have to look if we can use that for extra features. In general, arrays and views have about the same properties in everything, including supporting custom strided axes, so I'm not sure if I'm right about not grasping the concept. |
This moves all the ndarray-parallel functionality directly into ndarray itself.
This means the parallelization features are more visible (in docs by default) and more accessible. The user still has to opt-in to use it, and
use
the rayon traits.We also get rid of extra shims in the integration and use the rayon IntoParallel* traits directly, and can use inherent methods for
par_apply
and so on.Fixes #551